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Abstract 

Despite known shortcomings of the procedure, exploratory 
factor analysis of dichotomous tesr items has been limited, until 
recently, to unweighted analyses of matrices of tetrachoric 
correlations* Superior methods have begun to appear in the 
literature, in professional symposia, and in computer programs. 
This paper places these developments in a unified framework, from 
a review of the classical common factor model for measured 
variables through generalized least squares and marginal maximum 
likelihod solutions for dichotomous data. Further extensions of 
the model are also reported as work in progress. 



Key words: binary variables, categorical data, 

cont ingency tables , covariance structures , 
factor analysis, item res-jorra theory, 
latent structure, tetrachoric correlations 
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Recent Developments In the Factor 

Analysis of Categorical Variables 
1 • Introduction 
Under classical Thurstonian factor analysis (Thurstone, 
1947), values of p measured variables are modeled as linear 
functions of some smaller number of m continuous latent variables » 
the "factors" that account for the correlations among tve observed 
variables. The usual objectives in factor analysis are to 
determine the number of factors that provide a satisfactory fit to 
the observed correlation matrix and to estimate the regression 
coefficients of the observed variables on the factors~all this, 
it is hoped, leading to a more parsimonious and meaningful 
explication of the patterns of interrelationship among the 
observed variables* 

Recent interest in item response theoretical (1RT) methods of 
constructing and scoring tests (see, for example, Hambleton & 
Cook, 1977; Lord, 1980; Wright & Stone, 1979) has led to a renewed 
interest in the extension of clas3ical factor analysis to 
Hchotomous test items* In the extension, the measured variables 
of the classical formulation now plav the role of latent response 
processes to each of the items; a correct response is observed 
only when the response process variable arising in the 
confrontation of a given examinee with a given item exceeds a 
latent threshold characterizing the item. (Modifications will 
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also be introduced to account for the possibility of random correct t 

responses , as can occur when the test directions encourage 

examinees to guess on multiple-choice items.) While it is certain 

that not only the unidimensional models posited in most 

applications of IRT but the multidimensional models of factor 

analysts are strictly incorrect in any given application, a number 

of benefits may accrue nonetheless. It is not unreasonable to 

summarize into a single score, responses to a set of items fairly 

well explained by a single dominant factor, for example; but the 

appearance of clusters of items separating clearly into multiple 

factors suggests a need to consider reporting separate subtests 
1 

scores. 

Early work along these lines proceeded by first obtaining the 
matrix of tetrachoric correlations among the test responses, an 
approximation of the correlation matrix among the latent response 
proceses among the various items under the assumption that they 
follow a multivariate normal distribution. Those attempts ran into 
difficulties, due to the occasional values of +1 and -1 that 
result, the fact that matrices of sample tetrachorics are not 
necessarily positive definite, the lack of statistical tests for 
the number of factors, and the failure to account for the chance 
successes that occur with multiple-choice items. 

This paper reviews some recent work in factor analysis of 
categorical variables. Emphasis is on the generalized least 
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squares (GLS) solu:ion developed by Chrlstof fersson (1975) and 
Muthen (1978) and the maximum likelihood approach introduced by 
Bock and Aitkin (1981). The section on maximum likelihood 
solution and its extensions draws upon recent work reporter in a 
symposium at the 1984 meeting of the Psychometric Society, 
including papers by Bock (1984), Gibbons (1984a), Muraki (1984), 
and Muthen (1984a). We focus for the most part on extensions of 
the classical model, especially the normal case, for convenience 
of presentation. Many of the recent developments have taken 
place within this context, and it provides a unified framework of 
exposition against which other models may be introduced in 
contrast. 

Section 2 provides a brief review of factor analysis of 
measured variables, setting up notation and formulas in this more 
familiar context. Section 3 introduces the common factor model 
for dichotomous items. Sections 4 and 5 discuss estimation of 
factor loadings from matrices of tetrachoric correlations, 
unweighted (ULS) and weighted (GLS) respectively. Section 6 
discusses a full information solution based on the method of 
maximum likelihood. Finally, section 7 outlines a number of 
extensions to the basic model currently under investigation. 
These include Bayesian priors on unique variances, confirmatory 
factor analysis, comparisons of factor structures between groups, 
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and relaxation of assumptions about response functions and 

popualtion distributions. 

2. Factor Analysis of Measured Variables 

Factor analysis, at its heart, is a method of data explanation 

through rcodel-f itting. The matrix of covariances or correlations 

among a large number of variables y ■ (y^t^^tYp) *s the object 

of analysis; it is hypothesized that the interrelationships 

among the variables car. be accounted for by a linear multiple 

regression model, with the y f s as dependent variables. The 

distinguishing feature of factor analysis is that the predictors, 

6 - (9 9 ) are not observed but must be inferred from the 
m 1 m 

data. In this section, we review the basic models and procedures 

associated with factor analysis of measured variables. (For 

readable introductions to the concepts of factor analysis, see 

Harman (1976), Joreskog (1979), and Lawley & Maxwell (1971).) 

^•l The Common Factor Model 

The classical factor analysis model for measured variables 

assumes an nr-dimenslonal latent variable 9 » (9 ,...,9 ) in a 

mi m 

population of examinees. Without loss of generality, 9 is assumed 
to have mean 0. Observations on a random sample of N examinees, 
however, consist not of values of 9 but of values of p manifest 
variables y ■ (yj f «««iy)i where p > m. It is assumed that y 
depends stochastically upon 8 through the following system of 
linear equations: 
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Yi " *n 9 i + ... + A. 8 + 

7 1 111 1mm 1 

7 2 21 1 2m m 2 



y « A ,8, + ... + A 8 + e 
7 p pi 1 pm m p 



or, in matrix form, 

y « A8 + e . (2.1) 

A is typically referred to as the matrix of factor loadings. Let 
<& represent the covariance matrix of 8 and let ¥ represent the 
covariance matrix of the residuals e. The covariance matrix £ of y 
is then given by 

Z ■ A*A 9 + 4f 

Under the Thurstonian model, the residuals are assumed to be 
uncorrelated, and the factor loadings and the factor covariance 
matrix account entirely for the linear relationships among the 
manifest variables. The elements of the diagonal matrix * are 
typically referred to as the unique variances of the y's. 
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After incorporating constraints necessary to make the rao^el 
identified (sea Section 2.2), it is possible to fit a given I with 
respect to A and ¥ without additional assumptions about the 
distributions of y, 9, or e (see for example, Harroan (1976) and 
Thurstone (1947)). In order to facilitate the transition to the 
discrete case, howevar, we shall introduce some distributional 
assumptions and restrict consideration to statistical estimation 
procedures. Suppose the residuals in Equation 2.1 are also assumed 
to follow a multivariate normal distribution; e * MVN(0,40. The 
The distribution of y, conditional on 6 , or for a specified 
examinee with 0 ■ 0 , may be inferred as 



This is the conditional distribution of y. 

Assuming further that 9 <* MVN(0,*), we may derive the 
marginal distribution of y, or the distribution of y from an 
examinee selected at random, by integrating Equation 2.2 over the 
examines population: 



(ylB^A/O - MVN(A9 lf ») 



(2.2) 



p(yiA t f ,*) - / p(y|e,A,Y) P (s|«) d9 



(2.3) 



e 



Since both densities under the integral are normal, the 
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integration can be carried out explicitly. We find that 



y - MVN(0,I) (2.4) 



where again 



Z - A$A 1 + ? . (2.5) 



2.2 Parameter Estimati 



on 



The likelihood function for the responses of a random sample 
of N examinees under Equation 2.4 is given by 



n ex?( 7if l *i /2) 

.1 " i-l (2ir) p/2 |z| 1/2 



Maximizing Equation 2.6 with respect to the parameter matrices 
A, *, and ¥ proceeds by taking the log of Equation 2.6, 
differentiating with respect to each parameter, equating the 
results to zero, then finding parameter values that satisfy these 
so-called likelihood equations. Unique estimates of the parameters 
do net ex:! st, however, unless additional side restrictions are 
imposed along with Equation 2.6 in order to set the scales and 
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orientations of *:he latent 9 f s. It is typical to require that * 

* « I , the identify matrix of order m, and, in maximum 
m «m 

likelihood estimation, that A'f^A be diagc ial* 

The maximum likelihood (ML) estimation procedure described in 
the preceeding paragraph takes the form of minimizing a fitting 
function that is proportional to the log likelihood, namely 

F - -f [trCE^S) - logics | ] , (2.7) 

where S is the observed orrelation matrix among the y's. It is 
important to note that the product over N examinees that appears 
ir the likelihood simplifies down to expressions that involve only 
a summary of the response vectors, in terms of the observed 
covariance matrix* In other words, fully efficient estimates of 
A and * can be obtained by utilizing only the p(p + 0/2 elements 
of S, and that no information is lost by collapsing over the 
response patterns of N examinees, no matter how large N may be 
compared to p(p + l)/2. 

For later reference, we also mention two additional methods 
of estimating A and ¥. Both proceed by making the fitted E, or 
the function of A, and * given in Equation 2.5, close to S in 
some sense. Let , o l denote the "matrix stacking" operator, which 



ERJC 



13 



Recent Developments 



11 




(XpX2»*" 9 x^) f • The fitting methods are unweighted least squares 
(ULS), which minimizes 



F - (S° -E°)»(S° 



(2.8) 



or the component-by-component sums of squared differences between 
the elements of S and I; and generalized least squares (GLS), which 
minimizes 



or the sums of squared differences between elements of S and £ 
but wel^iiwed In a matter than takes into account the precision and 
the possibility of correlated errors in the estimation of S • 
In principle, the correct weight matrix required for a rigorous 
GLS solution Is W - I x Z where x represents the Kronecker or 
direct product of matrices. In practice, the consistent estimator 
S x S is used. 

For formal justification and computational details on each of 
three fitting methods, the reader is referred to Anderson (1959), 
Browne (1C 4/1977) , Joreskog (1967, 1977), Joreskog and 
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Goldberger (1972), and Lawley and Maxwell (1971). We merely 
mention a number of properties that are relevant to our 
presentation: 

!• All three methods provide consistent estimates of A and * 
under the assumptions noted at the beginning of this 
section. 

2. Both ML and GLS require positive definite matrices S; 
ULS does not. 

3. ML and GLS provide large-sample chi-square tests of 
model fit. Moreover, the difference between the chi- 
squares of nested models (e.g., a three-factor model 
versus a two-factor model) itself follows a chi-square 
distribution, with degrees of freedom equal to the 
number of additional parameters estimated in the less 
restrictive model, when the more restrictive model is 
correct. Thus, rigorous tests for the number of factors 
are available. 

2.3 Rotation of a Solution 

The solution provided by any of these procedures is 
unique but determined in part by the arbitrary imposition of 
rotational constraints; i.e., A^A 1 * I in GLS and ML, AA f » I 
in ULS, and * - I in all three. It is easily seen that infinitely 
many other solutions for A and <& would combine through 
Equation 2.5 to produce the same E. Let A be a square matrix of 
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f full rank m. with normalized columns. If E - A*A f + ¥, then it is 

-1 -1 

also -me that 2 - A***A* f + f , where A* - AA and ** -A *A f . 
Various choices of A, though leaving the factor solution essentially 

m 

unchanged in terms of model fit, can produce patterns of factor 
loadings that are easier to scan visually or to interpret 
substantively. The process of obtaining values of A* and ♦* is 
called factor rotation. Attention may be restricted to those A's 
than keep off-diagonal elements of * at zero (orthogonal rotations) 
or those that do not (oblique rotations). (See Harman (1976) and 
Thurstone (1947) for lucid explanations of rotation.) 
2.4 Heywood Cases 

It is possible to construct correlation matrices that conform to 
the common factor model* but for which one or more unique variances 
take the value of zero (Heywood, 1931). Zero uniquenesses correspond 
to measured variables falling completely within the factor space, 
or being explained perfectly by the latent variables, without 
measurement error at all. Negative uniquenesses are not defined 
within the usual context. Solutions with nonpositive uniqueness 
not generally palatable in practice. 

Two approaches to dealing with these so-caJltd Heywood 
solutions have been proposed in the literature. One Is to allow 
such solutions, with the nonpositive uniqueness taken as a possible 
warning of model misfit (JSreskog & Sorbom, 1980). In exploratory 
factor analysis, a Heywood solution may indicate that one is 
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attempting to fit a model with too many (or, occasionally, too few) 
factors, that one or more factors are poorly identified by the 
current set of observed variables, or, if sample size is small, 
that unfavorable sampling fluctuations have occurred. Appropriate 
remedies would be to fit a simpler model or to obtain data on 
additional variables and/or subjects. A second approach is to 
constrain estimation to solutions with only positive (or possibly 
only non-negative) unique variances. This may be done by imposing 
upon unique variances either arbitary constraints (e.g., 
Christof fersson, 1975, p« 9) or formal Bayesian prior distributions 
(e.g., Lee, 1981; Martin & McDonald, 1975). 

3. A Common Factor Model for Dichotomous Data 
In this section we outline the extension of the multiple 
factor model to dichotomous data. Attention is focused upon 
dichotomies which are reasonably considered to have arisen from a 
continuous latent process, but through observational constraints 
produce only dichotomous responses. Examples of this type would 
include right/wrong responses to test items, for/against votes on a 
referendum, and satisfied/dissatisfied judgments about a product. 
The model is also relaxed to allow for a fixed rate of "false 
positive ** responses, as might occur when examinees can respond 
correctly to test items through lucky guerses as well as through the 
aptitude of interest. 
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There is no impediment to computing Pearson produce moment 
correlations among dichotomous variables ("phi coefficients," as 
they are called in this special case), and it might seem natural 
to apply the methods of the previous section to fit factor analytic 
models to correlation matrices so obtained. Several writers, 
however, have demonstrated dangers inherent in such an undertaking. 

One problem is that the values of phi coefficients 
depend not only upon the stre\igth of relationship among variables, 
but upon the means of the individual variables as well (Carroll, 
1945, 1983). In the limiting case of two dichotomous variables with 
a perfect Guttman ordering, the value of the correlation obtained by 
Pearson's formula depends solely upon the means of the two variables 
and attains the value of 1 only when both variables have equal 
means. 

A second problem is that the value of a dichotomous variable 
is bounded, implying that its regression on any continuous latent 
variable with Infinite range cannot be linear (McDonald & Ahlawat, 
1974). If applied directly to correlations from dichotomous 
variables, the linear factor analysis model is given by Equation 
2.1 is mi3specified from the start and pc ntially misleading 
because the best linear approximation to a true curvilinear 
relationship will depend on the region in which the data are most 
informative. In other words, the estimated linear relationship 
will depend upon the mean of the binary variable. 

® 18 
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A third problem is illustrated in Mooijaart's (1983) » 
approximation of the covariance among two discretized variables 
(e.g., a phi coefficient) in terms of a factor model for underlying 
continuous variables and functions of the observed discrete 
variables. In the special cases of either (a) all low factor 
loadings in the underlying model or (b) all discrete variables 
having means near .5, a factor model with the same number of 
factors but rescaled loadings will provide a good fit to the phi 
coefficients. In general, however, the expression for phi 
coefficients is augmented by terms that depend on the skewness of 
the discrete variables, which, with binary variables, is a direct 
function of their means values. Additional factors may be required 
to fit the phi matrix when these additional terms are large and 
their patterns are unfavorable. 

When binary variables are produced by dichotomizing continuous 
variables, then, the choice of cutting points materially affects 
the values of the expected phi coefficients. Factor analyses of 
phi coefficients of binary variables produced by the ^ame 
underlying correlational structure but dichotomized at different 
points can conform to factor models with different structures and 
possiDly different numbers of factors. For these reasons, we shall 
not discuss the analysis of phi coefficients, but rather confine 
our attention to models and methods under which strength of 
relationship and mean level are not confounded. 
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3.1 The Model 

As In the classical model, we posit m latent variables 9« In 
the case of p > m observed responses (e.g., to a p-item test), we 
also posit the corresponding structure on p "response process" 
variables y - (yjf«>y p ): 

where is a residual* the density of which will be specified 
presently. In contrast to factor analysis of measurement 
variables, however, we do not observe y directly. Instead, we 
observe a vector of dichotomous variables x m (Xj,...,x^) with 
values determined in the following manner: 



i if y i > Y, 
x j " t 0 if 7j < Yj 

where Yj is a v^lue associated with item j~its "threshold" 
parameter. (The model will be relaxed in a following section to 
allow for the possibility of random positive responses.) Let 
r denote (y, »••• »Y )• 
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2 

Suppose that the residuals are distributed as N(0,a^) * 

and are Independent over Items and examinees* We shall denote the 

2 2 

diagonal matrix (Oj»***»0 ), or the vector of unique variances, 
a3 The conditional probability of a correct response from 
examinee 1 to item j is then given as 

v - I \ A e . 

oo i S Si 

P(x., - 1|9.) » f exp[-j( 2- ) 2 ] dv 

1J - 1 /2tt a y, j 



Y • " £ 



F( 



X t 9 . 
js si 



V!i> 



(3.2) 



Equation 3*2 will be recognized as a multivariate generalization 
of the two-parameter normal item response model (Lawley, 1943, 
1944; Lord, 1952)* Connections between the two models are 
explored in Lord and Novick (1968, Chapter 24) « 

Suppose it is further assumed that 0 distributes MVN(0,<fr) in 
in a population of interest* As in the classical model, it 
follows that the marginal distribution of y is MVN(0,E), where 
again 2 * A$A' + ¥ • The fact rhat neither 8 nor y are observed 
introduce? indeterminacies of scale and orientation into the 
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model; we shall begin to resolve them by specifying * - I n and 
Z - 1 for each j. This implies that 



or, in matrix notation, 

* - I - diag(AA') . 

Let x t - (x llt ...,x lp ) be the vector of 0/1 responses from 
examinee i in a randomly selected sacple of size N. The marginal 
likelihood of the data is given by 



Z - AA 



(3.3) 



and 




s 



N 



L((Xj 



x N )|A,D - n ; P (x |e,A,r)f(9) de 
i»i e 



N x 1-x 

- n ; n f.(9) ^(i - f (e)] 1J f(9) de 

i-l 9 j J " 3 " 



C3.A) 
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where f(9) represents the standard MVN density function. Equation 
3.4 can also be written as a product over 8 distinct response 
patterns x-, observed with frequencies r-, as 



L * n { / n F (6) X * j [l - TAB)] X * j f(8) de}^ (3.5) 

where s < min(N,2^). In contrast to the solution in which y's 
are observed directly, (3.5) cannot be collapsed further. 

This fact has Important implications for parameter 
estimation. It can be known a priori, for example, that the 
information about E contained in observed values of y from one 
million examinees to 100 items can be summarized without loss as 
a covariance matrix with just 5500 elements. If responses are to 
100 dichotomous items, however, a total of 2 100 distinct response 
patterns are possible; even allowing for the fact that irony of 
these patterns will not occur in any given sample, hundreds of 
thousands of distinct pieces of data must be maintained to produce 
fully efficient estimates of A and f • To put it another way, 
the information in all cells of the 2^ contingency table of 
responses to all items is required for fully efficient estimation 
of parameters in the factor model. 
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3,2 Accounting for Random Correct Responses 

For the purposes of test analysis and construction, a 
useful extension co the model described above is to account for 
the correct responses that result from correct guesses to 
multiple-choice it«fms. Under these circumstances, the 
probabilities of correct response from even examinees of very low 
ability do not approach the value of zero implied by Equation 3,2, 
Failure to take these effects into account can produce analyses 
that are misleading a& to not only the elements of A and T but as 
to the number of factors needed to account for the data (Carroll, 
1983), 

It is possible to allow for chance success on item j at the 
rate of g^ by taking 

F,(6) - g. + (1 - g.)F*(8) , 



where F*(9) is the function of and Yj given in Equation 3,2, 
which accounts for the rate of success produced by the latent 
factors of interest. No further revisions are required in 
Equations 3,3-3,5 t although the following sections will consider 
implications of this extension for estimation procedures. 
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4. An Unweighted Least Squares Solution 
Under the model of Section 3 for binary responses that arise 
from the dichotomization of underlying MVN response process 
variables, without the possibility of false positive responses due 
to guessing effects, it is possible to write the expectation of 
proportions of correct response to a given item j as 



; 

Y . 



f(z) dz 



(4.1) 



and the proportion of persons responding correctly co both items 
j and k as 



f(i lf i 2 |a jk ) dz L dz 2 



(4.2) 



where f denotes a standard normal density function, univariate or 
bivariate as appropriate, and c , denotes the correlation among 
response process variables y and y^. Denoting the expected 
proportion of examinees answering item j correctly but item k 
incorrectly as P.r, and defining P- and Prr analogously, we 

JK jk JK 

could write expressions similar to Equation 4.2 for each. 

P jk* P jk* anc * P jic are t * ie ex P ectec * proportions of response in a 
two-by-two contingency tableO 
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From the observed proportion p j , Y ^ may be estimated via 
Equation 4.1 by 

where F * is the inverse of the cumulative standard normal 
distribution. Given estimates of Yj and Y fc and the four entries 
in the two-by-two table of joint response frequencies, it is 
possible to estimate a via Equation 4.2. The resulting value 
is called the sample tetrachoric correlation coefficient 
(Pearson, 1900); efficient computing approximations ere given by 
Divgi (1979). Let S* be the matrix of (sample) tetrachoric 
correlations among a set of p test item*, with responses generated 
in accordance with the no-guessing model of Section 3. 
4.1 Unweighted Analysis of S* 

Now S* is an estimate of S, the correlation matrix among the 
latent y f s, which has the common factor model given in Equation 
3.3. Standard procedures for factor analysis of measured variables 
(Section 2) may be employed, then, to estimate A. Before 
proceeding, however, two points require attention. First, the 
sample tetrachoric takes a value of -1 or +1 when either pj^ or 
p r is zero. This problem is remedied in practice by adding a 
small number to each cell in the two-by-two contingency table for 
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each pair of items—in effect, placing a mild DirichL*t prior 
distribution on the joint proportions of response as in Fienberg 
and Holland (1970). Second, unlike a true correlation matrix or 
even a sample correlation matrix, S* is not: necessarily positive 
definite. Ibis fact typically rules out analysis by ML or GLS, 
leaving ULS. That is, A is estimated by minimizing the quantity 



j j<k JK JK 



4.2 Advantages ,d Disadvantages of the ULS Solution 

The advantages of ULS solutions for factor models for 
dichotomous v„-* rubles are first, its superiority over factor 
analysis of fhi coefficients, and second, its relative economy; 
solutions in the measured variables case generally require far 
less computation than the methods specifically designed for the 
categorical data, as outlined in subsequent sections of this 
presentation. 

The disadvantages of this solution can be classified into two 
categories. The first category arises in the attempt to compute S*. 
Extreme values will be poorly determined, and those that would 
have been fl or -1 take va.ues that depend on the choice of an ad 
hoc remedy. And because estimation error is introduced in t!*e 
production of S*, the statistical theory for obtaining ULS 
standard errors (Browne, 1974/1977) does not hold. The second 
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category arises from the fact that unlike the case of normally 
distributed measured variables, summarization of dichotonous 
variables in terms of a covariance matrix does not retain all the 
information aboui their joint relationships. Only the information 
in the one-way marginals (percents-correct) and two-way marginals 
is used. Computational efficiency is thus achieved at the 
sacrifice of information. 
4.3 Adjustments When Guesting la Present 

The preceding discussion considered the case in which 
responses were determined solely through 6, not accounting for the 

m 

possibility of chance successes. The same solution can be carried 
out when chance successes do occur, at prespecif ied rates g^ to 
each of the items, if the observed proportions and joint 
proportions are adjusted appropriately. Carroll (1945) and 
Samejima (in Green et al., 1982, p. 28) give formulas for this 
purpose. Jensema's (1976) expression for adjusted percents 
correct and Saaiejina's expressions for joint proportions are shown 
below. Observed values are indicated by asterisks; the adjusted 
values are subsequently used in Equations 4.1 and 4.2. 



p m (X>* - a )/ff 



p jk - p *k - ( *A )p 5k 



(g j 7i j )p jk + (g jv«J^ >p jk 
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V«A )P fk 




where ■ 1 - and g^ ■ 1 - g^« These adjustments can 
produce proportions above 1 or below 0. Ad hoc remedies, such 
as the Imposition of arbitrary floors and ceilings on either 
proportions or values of g^ are then required before the 
estimation of the factor model can begin. 



Section 4 presented formulas for the expected values of p^ , 
or Item proportions correct, and p^, or joint Item proportions, 
In terms of the parameters of the extended common factor model 
(possibly after adjustment for prespeclfled rates of chance 
success, as In Section 4.3). ULS estimations proceeds from these 
formulas alone, minimizing a quantity that measures the similarity 
between the data (sample percents correct and sample tetrachorlc 
correlations, the latter computed from sample joint proportions) 
and a fitted facsimile of the data in terms of the parameters. 
The similarity is judged by sum of the squared differences, 



5. Generalized Least Squares Solutions 
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element by element, with each element weighted equally. More 
efficient use of data can be made by taking into account the 
varying magnitudes and interrelationships of sampling error among 
uhe elements. One approach by which this objective can be 
achieved is generalized least squares (GLl). 
5.1 Christof fersson's Solution 

Let P - (PpP^.-.^p, P u»°°°» P jk»°")» with 1 < k < j < p, 
be the vector of the expected values of Pj and modeled as 

functions of A and r, and let p be the corresponding vector of 
observed values. When the model is correct, the quantity e « 
p - P will follow a multivariate normal distribution in large 
samples with expection 0 and covariance matrix 
Christof fersson (1975 f Appendix 2) derives an expression for a 

consistent estimator S of Z , and implements a GLS solution for 

~e «e 

che parameters of the factor model by minimizing 
F - (p - PVS^Cp " P) . 



The solution thus obtained provides consistent parameter estimates. 

A number of additional features of Christof fersson's solution 
also meri** consent at this point. 

First, his expressions for the elements of include not 
only pj and p^* terms from one-way and two-way margins of the 2 P 
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raw data table, but also terms from the three- and four-way 
margins; that Is, Joint proportions correct for Items taken three 
and four at a time. This means that the GLS solution Is using 
more Information than the ULS solution, but by Ignoring yet higher 
level Interactions, still not all of the Information available. 
(As discussed in Section 6.2, the loss may be negligible*) 

Second, statistical tests of model fit are available. 
Asymptotically F follows a chi-square distribution, with degrees 
of freedom equal to p(p + l)/2 minus the number of parameters in 
A and T estimated in the model (as in previous section, certain 
restrictions In A are required to eliminate linear and rotational 
lndetermlnacles). This test is not usually of interest so much 
for Itself — the model Is not expected to fit— but for comparisons 
between models with different numbers of factors. The difference 
between the chi-squares for an m factor and an m + 1 factor 
solution for the same data also follows a chi-square distribution 
In large samples when the m factor model is correct, with degrees 
of freedom equal to the number of additional parameters estimated 
In the less restrictive solution. Indeed, the test of most interest 
In educational and psychological applications is typically the 
comparison of the one- and two-factor solutions. 

Third, standard errors of estimation are also available. In 
large samples, the covariance matrix of estimation errors of the 
free elements of A and T is approximated by the inverse of the 
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matrix of second derivatives of F with respect to these parameters. 
Standard errors for individual parameters are square roots of the 
corresponding diagonal elements. In exploratory work, these 
standard errors are not of major interest. They apply to the 
parameters only as estimated, not to rotated solutions. They prove 
more interesting by way of contrast to those obtained in the full 
information maximum likelihood solution described in the next 
section. 

Fourth and finally, computation requirements are considerably 
heavier than those of the UIS solution. Solution is iterative, 
requiring the numerical solution of integrals of the form of 
Equations 4. Y and 4.2 in each cycle. Further comment on this 
point follows a discussion of Muthen's GLS solution, 
asymptotically equivalent to Christof fersson's but somewhat 
less burdensome. 
5.2 Muthen's Solution 

Muthen's (1978) GLS solution bears more resemblance to the 
ULS solution of the preceding section, as well as the solutions 
for measured variables; the fitting function again produces 
estimates that in the appropriate sense make a fitted correlation 
matrix similar to an observed one. Whereas Christof fersson 
minimizes residuals in terms of the P's in Equation 5.1, Muthen 
minimizes 



ERIC 



32 



F - (s - O'S^Cs - O 1 



Recent Developments 

30 

(5.2) 



where £ ■ (Cp^) with " r » ^2 " ^°12 99 9 9 ,a jk' ' 9 9 ^ 9 and 8 
being the sample estimates of the quantities, i.e., the sample 
thresholds and sample tetrachorics — where Sg is a consistent 
estimator of the covariance matrix of 6 - $ - s. Muthen obtains 
an expression for Sg from Chris tof fersson's expression for by 
"linearizing" the model; that is, by approximating the complex 
relationship between £ and P by the initial terms of a Taylor 
series expansion. Integrals of the form of Equations 4.1 and 4.2 
need then be evaluated only once. These procedures have been 
incorporated into the computer program LISCOMP (Muthen, 1985). 

Muthen's solution shares many of the other characteristics 
of Chr is tof fersson's, notably use of three- and four-way marginal 
information, consistent estimates, standard errors, and tests of 
fit. And although Muthen's solution is faster, practical 
limitations arise from the same source, namely, che magnitude of 

the matrix S • These effects are illustrated in Table 1. 
m e 

Computing requirements under the GLS solution increase 

proportionally to m and with the fourth power of p. About 25 

items seems to be an upper limit with current machinery. 

Insert Table 1 about here 
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Muthen notes that in many cases, ULS estimates are reasonable 
approximations to GLS estimates . The superiority of GLS, through 
its use of three-* and four-way joint proportions, becomes more 
evident at* one attempts to extract more from the data, so to 
speak; that is, with other features held constant, in solutions 
with fewer examinees, fewer items, or more factors . 

6, A Maximum Likelihood Solution 

The preceding sections have considered ULS and GLS 
estimation of the parameters of a common factor model for 
dichotoraous responses. These re "limited information" solutions, 
in that they utlilize only information in lower order margins of 
the full 2 P contingency table that summarizes all responses, and 
therefore all available information, for estimation. In this 
section, we review a full information solution, namely the 
marginal maximum liklihood (ML) estimation introduced by Bock and 
Aitkin (1981). (The Rock-Aitkin procedure extends on an earlier 
solution given by Bock and Lieberman (1970) for the one- 
dimensional case.) The following discussion is based on this 
approach, which has been implemented in the TESTFACT computer 
program (Wilson, Wood, & Gibbons, 1983). 
6 . 1 The Marginal Probability of ja Response Pattern 

Assume again the common factor model for dichotomous items 
given in Section 3, initially without the possibility of chance 
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success; that is, we posit m latent variables 9 and p > m 
observed binary variables that take the values 1 or 0 in the 



following manner: 



1 If y., > Y. 



x ij - to if y tj <y j (6,U 



where 



ij jl il jm im j 



The residual terms are independent over items and examinees, 

2 

and follow N(0,a^) distributions, where 



2 2 



Recalling Equation 3.2, this implies that 



- i|e) - f( 



(6.3) 



0 

ERIC 



35 



Recent Developments 
33 

where F is the cumulative standard normal distribution. It is further 
assumed that 8 - MVN(0,I ), from which it follows that y - MVN(0,I) 

m m ~B1 m mm 

where 



Z - AA f + » . (6.4) 

m mm m 

It was shown that under these assumptions, the probability 
of a typical response pattern x^ m (x^^ » x ^2» # • • ,x tp^ * 8 Siven 



qo oo x 1— X 

p 0 - p(x - x,) - / ... / n f,(6) - f,(8)] * j f(e) de. ... de^ 

* -* — -« j J " J" i ni 
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/ L 0 (8)f(e) de 

9 * ~ ~ ~ 



(We recall that the possibility of chance successes at fixed 

rates g, may be incorporated at this point by replacing F,(9) 
J J - 

above with **(e) " gj + O " 3j) F j( Q )*) 11118 int *8 ral can be 
approximated to any desired degree of accuracy by nrdimensional 
Gauss-Hermit e quadrature (Stroud & Sechrest, 1966): 



q q q 



Z Z L 4 ( ?k ) A(X k } A(X k * A(X k * ■ 



k k 0 k. 12 m 

m 2 1 
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where integration over real urspace has been replaced by summation 

over a finite grid of q m quadrature points ■ (X^ ,...,X^ )♦ 

1 m 

Because it has been assumed that the dimensions of 6 are orthogonal 
In the population of interest, the weight assigned to each point 
is the product of the weights associated with each coordinate X, . 
6.2 Estimation Procedures 

Consider the responses of a random sample of N examinees. 
Under the assumptions given above, it follows that the counts r 0 
of distinct response terns follow a multinomial distribution 
given by 



p (r| A rD riTT TT p !| lp ? • (6-5) 

- - r . ! r~ ! • • • r ! 1 2 s 
i i s 



The full information maximum likelihood solution given by Bock 
and Aitkin (1981) maximizes Equation 6.5 with respect to the 
elements of A and r. 

It proves convenient computationally to rewrite the argument 
of the normal probability function in Equation 6.3 in terms of 
slopes a^ and intercepts c^ as follows: 
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From maximum likelihood estimates of a's and c's, maximum 
likelihood estimates of y's and A's are obtained as 



where 



Y - -Cj/dj and X jk - a jk / dj 



Estimation proceeds by finding those values of a and c which 
maximize Equation 6.5. This is done by taking the first 
derivatives of the logarithm of the likelihood function Equation 
6.5 with respect to each parameter in turn, setting them to 
zero, and solving with respect to a and c. The interested reader 
is referred to Bock and Aitkin for details of the solution. The 
essence of the approach, however, can be seen in the form of the 
likelihood equations. For a typical parameter u^ from item j 
(either a slope or an intercept), we have 



m l <■ « 



u j " c l c p'*U "pm 
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where 



8 r l 8 

(6-7) 

is approximately proportional to the population density in the 
region of quadrature point and 



is approximately proportional to the probability of a correct 
response to item j from examinees with 6 f s in this region* (An 
application of Bayes theorem will be recognized in Equations 6.7 
and 6.8, yielding the posterior probability of ability given 
, conditional on A and r.) 

Solution of these equations is iterative, since the terms r 

J * 

and depend on the parameters a and c themselves t* r^ugh 
L^(X k ). In a variation of an EM algorithm (Dempster, Laird, & 
Rubin, 1977), Bock and Aitkin proceed in cycles with two steps each: 
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E-step: Using provisional estimates a 1 and c C , evaluate 
Equations 6.7 and 6.8. These are the expected 
values of the population densities and item 
proportions correct in the regions of the 
quadrature points, conditional on the data and 
a and c • 

M-step: Taking the T ^ s and N^'s as known, solve 

Equations 6*6 with respect to the parameters 

. , t+1 j t+1 
to obtain a and c 

Solving the so-called likelihood equations in this manner 
yields saddle points cr relative extreraa of Equation 6.5. Whether 
they are relative maxima can be determined by examining values of 
the likelihood function in the region around the solution. 
Whether a relative maximum is unique can be studied by iterating 
from a number of different starting values. 

As with the GLS solution, the ML solution provides for standard 
errors of estimation and statistical tests of fit. The covariance 
matrix of estimatioa errors of the parameters is given by the 
negative inverse of the matrix of expected second derivatives of the 
log likelihood function; this may be approximated by the matrix of 
second derivatives at the ML solution. Standard errors are obtained 
as the square roots of the appropriate diagonal elements. For a 
model with m factors, the likelihood ratio chi-square approximation 
for a test against a general multinomial distribution is given by 
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G 2 - -2 Z r A (log NP^/^) 

% 

with degrees of freedom equal to 2 P - p(ji + 1) + m(m - 0/2. 
This value reflects the number of cells in the full contingency 
table layout for the data, less the number of parameters estimated 
plus the number of constraints imposed to effect identification. 
Because the expected number of examinees per cell will usually be 
small for more than, say, 10 items, the approximation to the chi- 
ssuare distribution may be unreliable. Comparison of for 
nested models such an an m factor model versus m + 1 factor model, 
however, is more robust under these circumstances. 

A comparison of the standard errors for estimated parameters 
obtained from GLS and ML provides a measure of a loss of 
informatic in GLS when joint information for more than four 
items at a time is neglected. Comparisons reported by Gibbons 
(1984a) indicate the differences are slight; not only standard 
errors comparable within .01 were found for a daca set amenable 
to solution by both ML and GLS, but similar parameter estimates 
and chi-square values were obtained. 
6,3 ML Versus GLS 

Given that both ML and GLS provide standard errors, tests of 
fit, and comparable and consistent parameter estimates, it might 
be asked whether one method is to be preferred over the other. 
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The answer is yes, at least with present computing machinery; the 
computational algorithms of ML and GLS present clear and distinct 
advantages of one solution over the other under appropriate 
circumstances. As noted in the previous section, the demands of 
GLS increase linearly with the number of factors but with the 
fourth power of the number of items* The numerical integration 
over the factor space required in ML, on the other hand, implies 
geometric increases in computation with the number of factors, 
although the item by item computations required in the tt~8tep8 
increase only linearly with the number of items* The practical 
implications are these: ML is preferable for long tests with few 
factors; GLS is preferable for short tests with many factors; both 
are acceptable for rhort tests and few factors; and at present, 
neither is very good for long tests and many factors* (Bock 
(1984) quantifies the current meaning of the phrase "many factors'* 
saying that with 60 items, 1-5 factor models are quite reasonable 
with ML, 4 factors are possible, and S is about as ouch as 
currently feasible* ) 

7* Further Extei 3^ons of the Models 
The preceding sections of this review have considered the 
extension of classical factor analysis to dichotomous variables, 
concentrating on the basic models and on estimation procedures* 
In this final section, we briefly survey a number of additional 
directions in which these models may be further extended, and 
direct the reader to work in progress in these areas* 
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7.1 Polytomous Responses 

Discussion thus far has concentrated on analyses of 
dichotomous data. Data received in the form of ratings on 

-point ordinal scales can also be addressed in much the same 
manner, if it is reasonable to suppose that the data arise from 
cut points on underlying continuous normal variables. Let the 
probability of a response in a category less than or equal to 
category k be given by 



y - E A 8 

js is 



F. k (9) L. / eX p[ -i ( S_ ) 2 ] dy 

k — l ) . . • 9 ti ™ 1 



and F. (9) is defined as 0 and F (9) is defined as 1. 
jo- j fBj . 

Then the probability of a response in category k is given by 



P(x 5j -k|e) -F Jk <e)-F JfW <e) . 



Under this model, either of two approaches toward parameter 
estimation can be taken. Under ULS or GLS, one first estimates 
the correlations among supposed underlying MVN variables y; these 
are called the sample polychoric correlations (Olsson, Drasgow, & 
Dorans, 1982). From this point estimation proceeds as in the 
dichotomous case. Such solutions are provided in Joreskog and 



ERIC 



43 



Recent Developments 



/ 



Sorbom's (1984) LISREL program and Muthen's (1985) LISCOMP. Under 
ML, solutions are available for both the unidimensional case 
(Muraki, 1983, and Thissen, 1984) and the multidimensional case 
(Muraki, 1985) • In principle, all of the extensions mentioned in 
the following sections are applicable to polytomous response data* 
7.2 Simultaneous Estimation of Asymptotes 

The marginal probability of a sample of response patterns 
Xjj.#.,x^ was given in Section 3 as 



where the item response functions F (9 ) were given Dy either 




(7.1) 



F 4 (8) - 




/ exp[ - j ( 




) 2 ] dy 



(7.2) 



the cumulative normal distribution, or by 



F*(6) 



+ (1 - g,)F 4 (6) 



(7.3) 



with gj a fixed constant indicating a possibly nonzero lower 
asymptote for the probability of a correct response from even 
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examinees with low values of 6 in every component. Under ULS and 
GLS estimation, use of Equation 7.3 rather than Equation 7.2 led to 
adjustments of the observed proportions and pairv5.se proportions 
of correct responses to items. Under ML estimation, the adjustment 
for chance correct responses need not be limited to fixed values 
g ; Ui principle there is no reason that Equation 7.1 cannot be 
maximized with respect to the g's as well as the a's and c f s. One 
simply includes additional likelihood equations, one for each gj 
(or only one if it is desired to estimate a common g for all 
items) of the form given as Equation 6.6. This possibility is 
currently under investigation by Bock and Muraki (1984). 

Preliminary results reported by Muraki (1984) with fixed 
asymptotes indicate caution may be required in interpreting the 
results of such an endeavor. Muraki examined simulated responses 
to 25 items from a randomly generated sample of 1000 sub J acts from 
a standard normal population, with the true item response model 
having one dimension and including an asymptote of .20 for all 
items. In a preliminary analysis, a one-factor item response model 
with estimated lower asymptotes was fit to the data using the BILOG 
computer program (Mislevy and Bock, 1982) in order to obtain values 
of g vhich could be input to common factor runs. Four factor 
models were fit to these data by means of the ML solution in the 
TESTFACT program: 
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1. One factor, g's fixed at zero. 

2. One factor, g's fixed at nonzero preliminary estimates. 

3. Two factors, g's fixed at zero. 

4. TWo factors, g's fixed at nonzero preliminary estimates. 
It was found that models 2 and 3 both provided a good fit to the 
d?ta, as well as model 4. 

This finding suggests that the likelihood surface of a more 
general model that includes both 2 and 3, namely a two-factor 
model in which asymptotes are also estimated, is nearly equally 
high in regions around at least two possible parameter vectors, and 
may even exhibit relative maxima at these points. This finding is 
not disturbing from a data-analytic point of view; It is not 

prisitj to have obtained a good fit from model 3, even though it 
was not the model under which the data were generated, in view of 
the fact that more parameters were estimated. Practical 
considerations give one pause, however; two solutions (2 and 3) 
from the plausible general model (4) both explain the data nearly 
equally well, but have quite different implications for action. 
Without careful examination, the decision of whether or not to 
split the items into two different tests might depend on the 
starting values that one might happen to supply to the iterative 
solution. One must conclude, not surprisingly, that model fitting 
alone, without consideration of the nature of the data and the 
properties of the models being used, should not be the sole guide 
to test construction decisions. 
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7.3 Bayeslan Prior Distributions 

As noted in Section 2, the occasional appearance of Heyvood 
solutions, or occurrences of zero or negative unique variances, 
has led various researchers to incorporate Bayeslan prior 
distributions on these parameter (Lee, 1981; Martin & McDonald, 
1975). Under the ML solution presented in Section 6, unique 
variances do not appear as parameters to be estimated; their values 
are implied through values of the a's through 



Under these circumstances, a Heyvood solution takes the appearance 
of one or more a's becoming infinite. To a* jid this problem, it 
might seem appropriate at first blush to impose prior distributions 
on the a 1 8. The difficulty arises, however, when comparing the fit 
of competing models, that the strengths of the priors imposed on 
different solutions may vary as a function of the number of 
parameters being estimated. 

A more satisfactory solution, developed by Mislevy and Bock 
and reported in Bock (1984), is to impose prior distributions on 
a's implicitly, by imposing them on unique variances and inferring 
the implied distributions on the joint distribution of a's through 



°j " 1 " 1 
s 




(7.4) 



1 + a' 



2 
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Equations 7.4, Independent beta distributions on unique variances 
are proposed, with parameters (r,l) where 1 < r < 2. This 
distribution takes the form 

p(ir) - if l (r,l)* r ~ l (7-5) 

where B(r,l) represents the beta function, A choice of r near 1 
results in a prior distribution that runs nearly flat across the 
unit inverval, but drops suddenly and steeply to zero as ir 
approaches zero. This is tantamount to saying that one knows 
little about the value of the unique variance, except that it is 
not zero or negative. Substituting the expression Equation 7.4 
into Equation 7.5, we have a joint prior distribution for the 
slope parameters of item j: 

2 

p(ajr) - B~ l (r,l)(l - E ^f— ] r ~ l . (7-6) 

~ j 8 1 + a Js 

Multiplying the marginal likelihood function Equation 6.5 by the 
prior distribution Equation 7.6 on a's yields an expression 
proportional to the posterior distribution of the a f s and c f s, 
with a diffuse prior distribution on c's implicit. Vhe result 
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is then maximized as in the straight maximum likelihood solution, 
except the maxima are now modal points of the posterior. 

By similar methods, prior distributions could also be 
introduced for A, $, and, when included in the model, the guessing 
parameter g. A fully Bayesian approach would allow for the 
incorporation of prior knowledge about items and hypotheses about 
their interrelationships. While such a treatment has yet to appear, 
the stage has been well set; the marginal maximum likelihood 
solution described in Section 5 provides a satisfactory starting 
point for dealing with the likelihood term, and experience with 
foims and procedures for prior distributions gained ia the measured 
variables case (e.g., Lee, 1982; and Martin & McDonald, 1975) 
appears readily transferable. 
7.4 Relaxation of Distributional Assumptions 

The usual factor analytic formulation for discrete variables 
assumes normal distributions for both f"hc response functions (or 
conditional distributions of y given 6), and for the distributions 
of 6. These assumptions are motivated by convenience; the marginal 
distribution resulting from the mixture (Equation 2.3) is itself 
normal, simplifying to expressions of the type shown as Equation 2.4 
and 2.5. There is no reason, however, not to consider other 
distributional forms. Use of the logistic function for the 
conditional distribution, for example, leads in the one-dimensional 
case to certain item response models considered in Lord and Novick 
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(1968), Bartholomew (1930) suggests a model in which both f (y | 8 ) 
and g(8) are logistic; i.e., 



and 



P(x. 4 - 1|9) - [1 + 
1 J 



exp( Cj + E aij 6 ik )l 



p(e x < x 



l 



x) - 
m 



[1 + exp^ - 8 k )] 



-1 



Due to the similarities in the shapes of the logistic and normal 
distributions, results from this logit factor model can be expected 
to agree well with results from the normal model discussed in the 
preceeding sections. Computations appear simpler under the logit 
model in the one-dimensional case, but simpler under the normal in 
the multivariate ca*e. 

More restraints than are actually needed to obtain an 
identified model are still being imposed, however (see Bartholomew, 
1980, 1984, 1985). Indeed, if the response functions are 
sufficiently flexible, the distribution 9 can be arbitrarily 
specified within broad limits. Suppose that the marginal 
distribution of response y is given as 



p(y) - / f(y|e) g(9) d9 
e - - - 
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where g is the continuous distribution of the latent variable 9. 
Let g* be any other density over the same latent space that can be 
obtained by suitable stretching, expanding, or rotation. That is, 
g*(8) - h(g(8)), where h is continuous and strictly increasing 
in all components. Define f* by f*(y|e) - f(y|h~ l (8)). Then 

p(y) - / f(y|e) g(e) de 
e m 

m 

- / f*(y|e) g*(e) de . 
e " - 

This result suggests three ways by which distributional 
assumptions in the normal model for categorical variables might be 
relaxed* 

First, one might wish to maintain the normal linear regression 
model for the response functions, but allow the 6 distribution to 
to take forms other than the standard normal* The idea here would 
be to maintain response functions similar in form to IRT models 
contemplated for subsequent use, but avoid distortions in A due 
to additional and unnecessary assumptions about the shape of the 
population distributions. Bock and Aitkin (1981) mention this 
possibility, and methods for estimating latent distributions that 
could be incorporated into the ML solution are found in Mislevy 
(1984). 
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Second, one might choose to relax even further by fixing the 8 
population distribution in some tractible manner— a* g. , uniform 
density on the unit interval — but allowing very flaxible or even 
nonparameteric forma for the response functions. The idea here 
would be to obtain more detailed diagnostic information about items, 
such as the presence of non-monotonic response functions. Work 
along these lines has been begun in the unidlmensional case by 
Winsberg, Thissen, and Wainer (1982), who fit spline functions to 
item response data. Again, these extensions can be incorporated 
into the ML solution in a straightforward manner. 

Third, one can specify th* form of f(x|e) to achieve desired 
properties. The next subsection considers a line of work with 
this motivation. 

7.6 Foundations of Factor Analysis 

In a more general setting that includes the factor analysis 
of categorical variables, Bartholomew (1980, 1984, 1985) began by 
considering implications for the conditional distribution h(6|x) 
(what one knows about the latent variables after having observed 
the manifest variables) imposed by the choice of the form of 
f(x|e). He shows that if (i) conditional or local independence 
is satisfied, i.e. , 



P 

f(x|6) - n f.(x.|8) 
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so that the m latent variables account completely for 
relationships among the p manifest variables, and (ii) each 
f^(Xj|9) belongs to the exponential family, i.e., 

fjCXjle) - exp{ 1 1 Z u jk <x j )] (7 - 7) 

with the special restriction that 

W - Y jk + °jk u j ( v 



then there exists an m-dimensional sufficient statistic X for 6, 
in the form of m functions of the p responses in x: 

If each fj is normal, Poisson, or binomial, then each u j^ x j^ is 
proportional to x^. In the normal caee introduced in Section 3, 
the sufficient statistics are given by 

X - A Mf~*x . 



Equivalently , 
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:9 + E 1/2 v 



where H » AY *A and v is a *andom vector of independent 
standardized variables. The sufficient statistics nay thus be 
thought of as a weighted average of the latent variables of 
interest and residuals, the latter of which contain variation 
specific to individual variables and random error. 

Attention is focused upon estimable linear combinations of 
observed variables, which contain all the information in the data 
about the latent variables. Bartholomew points out that these 
statistics remain unchanged with monotonic transformations of any 
coordinate of the latent distribution. It may be inferred that in 
the absence of additional external reasons to specify the exact 
form of the latent marginal distribution g or the conditional 
distribution f, factor analysis models provide at best ordinal 
information within dimensions about persons 1 values on latent 
variables. The margial or de rings are not invariant with respect 
to rotation, so even ordinal information is conditional on the 
arbitrary specification of orientation whenever m > 1. 

The dependence of factor analytic solutions upon such 
arbitrary choices as scaling and orientation of coordinates has 
long been a source of dissatisfaction with analytic procedures. 
A degree of specification on the form of f sufficient to eliminate 
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these indeterminacies in case of binary variables is found in 
Stegelmann's (1983) multidimensional Rasch model. In its general 
form, 



where the a. take prespecif ied values of i or 0. 
js 

A submodel of Equation 7.7, Equatiou 7,8 leads to sufficient 
statistics of the form 



Note that since the a's are prespecified these are functions of 
data alone — not of parameters to be estimated* Rotational 
indeterminacy is eliminated by th fixed value j of the factor 
loadings. Scaling indeterminacy is eliminated by Rasch 8 
requirement of "specific objectivity. " i.e., that the marginal 
likelihood h(x|n) be expressed in a form in which the person 
parameter i can be separated from the item parameters n as 
follows: 



fj (Xjje) -{i +«p[ -i « j8 <e 8 -y]> 



(7.8) 




> • • • > 
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h(x|n) - / f(x|6,n)g(8) d8 

mm n m m m m m 



- [Prob(x|x,n>] * [ / p(x|e,n)g(8) d8] • 



The only cransf ormations to f and g that maintaiu this property 
are linear; hence, interval-scaled measurement is assured— at 
the cost of very strict assumptions about the form of f and the 
value of a. 

7.7 Confirmatory Factor Analysis » Multiple Group Solutions » and 
Structural Equations Modeling 

The focus of this review has bean on exploratory factor 
analysis; it is not known a priori how many factors are required 
to explain the data, much less their composition and 
interrelationships* Hypotheses about such matter* may be 
entertained, however, and it proves useful to be able to fit 
common factor models under which certain parameter elements 
(factor loadings, unique variances, factor variances and 
covariances) are set to predetermined values or constrained to 
equal one another. By comparing chi-square indices of fit of 
competing models, one could then test hypotheses suggested by the 
content of the observed variables in light of psychological or 
sociological theories. Jtfreskog (1969) describe! maximum 
likelihood procedures by which this may be accomplished in the 
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setting of measured variables. Similar procedures have been 
developed for the setting of categorical variables by Gibbons 
(1984b), using ML, and by Muthen (1978), using GLS. 

Gibbons (1984a) and Muthen and Christof fersson (1981), again 
working with ML and GLS respectively, perform confirmatory factor 
analysis over several examinee populations simultaneously. This 
work is also an extension of procedures developed by Joreskog 
(1971) and SSrbom (1974) for measured variables. The interest 
here is in testing hypotheses about whether certain features of a 
common factor model can be taken as invariant across populations; 
e.g., whether factor loadings of items can be construed as 
invariant, suggesting a similar framework for approaching a 
questionnaire, while factor distributions and unique variances 
differ from one group to the next, suggesting varying population 
distributions and measurement precision. 

Muthen (1979, 1984b) has extended Joreskog 1 s work in yet 
another area, namely that of ^deling structural relationships 
among latent variables (Joreskog, 1974, 1977; Joreskog & Sorbom, 
1984). Not only are latent variables 6 posited to account for 
interrelationships among manifest variables, but relationships in 
the form of linear regression functions may be posited among 
latent variables. Analyses may consider several populations 
simultaneously, thus allowing for a wide variety of hypotheses 
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about the relationships of variables within and between groups to 
be studied* 

8* Conclusion 

Factor analyses of dichotomous data were first undertaken as 
a diagnostic tool in test construction* Deficiencies in available 
methods of analysis, mainly unweighted least squares factor 
analysis of phi coefficients or tetrachoric correlations, 
prevented these attempts from fulfulling their objectives 
satisfactorily. In particular, these problems included 
computational inaccuracies, failure of requisite assumptions, and 
lack of rigorous statistical foundation. Recant developments of 
generalized least squares (GLS) and maximum likelihood (ML) 
procedures have overcome these problems, albeit at the cost of 
heavier computacional burden. 

The developments reviewed here were intended to provide a 
conceptual framework and rigorous estimation procedures for the 
factor analysis of categorical data. They foreshadow two likely 
directions of future development. 

The first is the extension beyond factor analysis; the 
models, concepts, and estimation procedures are clearly applicable 
to a much broader class of problems involving categorical data. 
Muthen's models for structural equations among latent variables 
for categorical observations, and Gibbon f s (1981) longitudinal 
models for time-structured categorical data are cases in point. 
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The second stems from Muraki's analysis of factor analytic 
models in which guessing parameters are also estimated (Section 
7.2). It gives one pause to realize that two models, distinct 
beyond rotation and holding different implications for test 
construction, offer nearly equally good fit to a given data set. 
The limitations of purely exploratory factor analyses in the 
classical tradition, when applied to categorical data~even after 
conceptual and estimation problems have been resolved~are 
apparent. Continued development can be expected, therefore, along 
lines that allow the researcher to incorporate prior information 
and scientific hypotheses into the process at the stage of 
modeling, rather than interpreting results from a minimally 
restrictive model. Initial efforts along this line from the 
sampling statistics perspective are exemplified by the 
confirmatory and structural equations models discussed in Section 
7.7, and may be contemplated from the Bayesian perspecti^. by the 
approach sketched in Cection 7. 
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Footnote 

^he exploratory nature of this use of factor analytic models, 
and the implicit expectation of subsequent use of item response 
models of similar forms, oust be stressed here* That one 
unidimensional model of r specified parametric form will not fit 
a data set does not preclude the possibility that another 
unidimensional model of a different form will. If the q\iestion 
is whether the data can be explained in terms of any unidimensional 
monotonic latent variable model, with conditional independence, 
including ones quite dift^rent from the familiar and convenient IRT 
models in current use, then the nonparametric approach found in 
Rosenbaum (i984) is more appropriate. 
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Table 1 





Numbers of Elements In GL3 


Factor Analysis 


Number of 
variables 


Number of elements In 
matrix of tetrachor'c 
correlations 


Number of elements In 
error covat lance matrix 


c 

J 


>c 


45 


10 


45 


990 


20 


190 


17,995 


40 


780 


303,810 


60 


1770 


1,565,565 


80 


3160 


4,991,220 


100 


4950 


12,248,775 
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